Goto

Collaborating Authors

 chatgpt model


From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT

Hamed, Ahmed Abdeen, Lee, Byung Suk

arXiv.org Artificial Intelligence

The generative capabilities of LLM models present opportunities in accelerating tasks and concerns with the authenticity of the knowledge it produces. To address the concerns, we present a computational approach that systematically evaluates the factual accuracy of biomedical knowledge that an LLM model has been prompted to generate. Our approach encompasses two processes: the generation of disease-centric associations and the verification of them using the semantic knowledge of the biomedical ontologies. Using ChatGPT as the select LLM model, we designed a set of prompt-engineering processes to generate linkages between diseases, drugs, symptoms, and genes to establish grounds for assessments. Experimental results demonstrate high accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and genetic information (88%-98%). The symptom term identification accuracy was notably lower (49%-61%), as verified against the DOID, ChEBI, SYMPTOM, and GO ontologies accordingly. The verification of associations reveals literature coverage rates of (89%-91%) among disease-drug and disease-gene associations. The low identification accuracy for symptom terms also contributed to the verification of symptom-related associations (49%-62%).


Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT

Song, Xinrui, Zhang, Jiajin, Yan, Pingkun, Hahn, Juergen, Kruger, Uwe, Mohamed, Hisham, Wang, Ge

arXiv.org Artificial Intelligence

The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduate medical imaging course in the Spring 2024 semester. This study investigates the use of ChatGPT throughout a semester-long trial, providing insights into students' engagement, perception, and the overall educational effectiveness of the technology. We systematically collected and analyzed data concerning students' interaction with ChatGPT, focusing on their attitudes, concerns, and usage patterns. The findings indicate that ChatGPT offers significant advantages such as improved information access and increased interactivity, but its adoption is accompanied by concerns about the accuracy of the information provided and the necessity for well-defined guidelines to optimize its use.


Can We Use Large Language Models to Fill Relevance Judgment Holes?

Abbasiantaeb, Zahra, Meng, Chuan, Azzopardi, Leif, Aliannejadi, Mohammad

arXiv.org Artificial Intelligence

Incomplete relevance judgments limit the re-usability of test collections. When new systems are compared against previous systems used to build the pool of judged documents, they often do so at a disadvantage due to the ``holes'' in test collection (i.e., pockets of un-assessed documents returned by the new system). In this paper, we take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes by leveraging and grounding the method using existing human judgments. We explore this problem in the context of Conversational Search using TREC iKAT, where information needs are highly dynamic and the responses (and, the results retrieved) are much more varied (leaving bigger holes). While previous work has shown that automatic judgments from LLMs result in highly correlated rankings, we find substantially lower correlates when human plus automatic judgments are used (regardless of LLM, one/two/few shot, or fine-tuned). We further find that, depending on the LLM employed, new runs will be highly favored (or penalized), and this effect is magnified proportionally to the size of the holes. Instead, one should generate the LLM annotations on the whole document pool to achieve more consistent rankings with human-generated labels. Future work is required to prompt engineering and fine-tuning LLMs to reflect and represent the human annotations, in order to ground and align the models, such that they are more fit for purpose.


CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models

Hajipour, Hossein, Hassler, Keno, Holz, Thorsten, Schönherr, Lea, Fritz, Mario

arXiv.org Artificial Intelligence

Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Their advances in competition-level programming problems have made them an essential pillar of AI-assisted pair programming, and tools such as GitHub Copilot have emerged as part of the daily programming workflow used by millions of developers. The training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure. While these models have been extensively assessed for their ability to produce functionally correct programs, there remains a lack of comprehensive investigations and benchmarks addressing the security aspects of these models. In this work, we propose a method to systematically study the security issues of code language models to assess their susceptibility to generating vulnerable code. To this end, we introduce the first approach to automatically find generated code that contains vulnerabilities in black-box code generation models. To achieve this, we present an approach to approximate inversion of the black-box code generation models based on few-shot prompting. We evaluate the effectiveness of our approach by examining code language models in generating high-risk security weaknesses. Furthermore, we establish a collection of diverse non-secure prompts for various vulnerability scenarios using our method. This dataset forms a benchmark for evaluating and comparing the security weaknesses in code language models.


A Wide Evaluation of ChatGPT on Affective Computing Tasks

Amin, Mostafa M., Mao, Rui, Cambria, Erik, Schuller, Björn W.

arXiv.org Artificial Intelligence

With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of such models are still quite limited. In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection. We introduce a framework to evaluate the ChatGPT models on regression-based problems, such as intensity ranking problems, by modelling them as pairwise ranking classification. We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers. The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems, where GPT-3.5 and especially GPT-4 have shown strong performance on many problems, particularly the ones related to sentiment, emotions, or toxicity. The ChatGPT models fell short for problems with implicit signals, such as engagement measurement and subjectivity detection.


DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Yao, Zhewei, Aminabadi, Reza Yazdani, Ruwase, Olatunji, Rajbhandari, Samyam, Wu, Xiaoxia, Awan, Ammar Ahmad, Rasley, Jeff, Zhang, Minjia, Li, Conglong, Holmes, Connor, Zhou, Zhongzhu, Wyatt, Michael, Smith, Molly, Kurilenko, Lev, Qin, Heyang, Tanaka, Masahiro, Che, Shuai, Song, Shuaiwen Leon, He, Yuxiong

arXiv.org Artificial Intelligence

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.


Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Das, Mithun, Pandey, Saurabh Kumar, Mukherjee, Animesh

arXiv.org Artificial Intelligence

Hate speech is a severe issue that affects many online platforms. So far, several studies have been performed to develop robust hate speech detection systems. Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection. However, it is crucial to comprehend the limitations of these models to build robust hate speech detection systems. To bridge this gap, our study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages. Our evaluation employs a series of functionality tests that reveals various intricate failures of the model which the aggregate metrics like macro F1 or accuracy are not able to unfold. In addition, we investigate the influence of complex emotions, such as the use of emojis in hate speech, on the performance of the ChatGPT model. Our analysis highlights the shortcomings of the generative models in detecting certain types of hate speech and highlighting the need for further research and improvements in the workings of these models.


Azure OpenAI Service models - Azure OpenAI

#artificialintelligence

Azure OpenAI provides access to many different models, grouped by family and capability. A model family typically associates models by their intended task. The following table describes model families currently available in Azure OpenAI. Not all models are available in all regions currently. Each model family has a series of models that are further distinguished by capability.


ChatGPT: Revolutionizing the Conversational AI Landscape with Python

#artificialintelligence

Chatbots have become increasingly popular over the past few years, with more and more businesses and individuals turning to conversational AI technology to improve their customer service, automate routine tasks, and enhance user engagement. One of the most powerful chatbot development tools available today is ChatGPT, a cutting-edge natural language processing (NLP) technology built on the GPT-3.5 architecture and powered by Python. In this article, we'll take a closer look at ChatGPT and explore how it is revolutionizing the conversational AI landscape with Python. ChatGPT is an advanced conversational AI technology that is designed to understand and interpret human language in a way that is more accurate and contextually relevant than ever before. It is built on the GPT-3.5 architecture, which is a variant of the GPT-3 architecture that was trained on an even larger dataset of text.


ChatGPT training, bowling and padel - Riihicloud ChatGPT training, bowling and padel - Riihicloud

#artificialintelligence

The next task on my to-do list is to write a blog post about Riihisoft's and Riihicloud's ChatGPT training and recreation day. "Welcome to read a blog post that discusses a training day aimed at software developers, which focused on ChatGPT! ChatGPT is an AI-based language model that can answer complex questions in natural language. This training day was specifically targeted at software developers who wanted to learn more about using ChatGPT and its potential in software development. The training day began with a thorough introduction to ChatGPT and how it works. This was important because many of the participants had not used ChatGPT before and needed to understand its basic principles before trying it out practically. Next, the focus was on how ChatGPT can be used from a software development perspective. Participants were able to try out ChatGPT in different use cases, such as automating customer service, generating questions and answers, and searching databases. This gave participants the opportunity to see how ChatGPT can be utilized in different ways in software projects. However, the main emphasis of the training day was on practical exercises. Participants were given the task of training their own ChatGPT model to answer specific questions and integrate it into an existing software project. This exercise demonstrated how ChatGPT models can be trained to meet certain use cases and how they can be used to improve the functionality and usability of software. The training day was a success, and participants gained many new ideas and insights into how ChatGPT can be used in software development. They also gained a good understanding of how ChatGPT can become a valuable tool for software developers in the future. This training day proved that ChatGPT has great potential to change the way software developers build software in the future."